Degraded Script Identification for Indian Language- A Survey

نویسندگان

Manoj Kumar Shukla

Haider Banka

S. N. Srihari

C. Y. Suen

R. Legault

C. Nadal

M. Cheriet

چکیده

The working module of any Optical character Recognition system almost depends upon printing and paper of the input document image. A number of OCR techniques are available and claim correctly identified accuracy in printed document image in Indian and foreign script. A few report have been found on the recognition of the degraded Indian language document. The degradation in any scanned printed document can be of many types. In this paper, we focus a survey of degraded script identification for Indian Language document.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Script Identification for Document Image Retrieval: A Survey

In recent years there are many multimedia documents captured and stored with the advances in computer technology and hence the demand for recognizing and retrieval of such documents has increased tremendously .In such environment the large volume of data and variety of scripts make manual identification unworkable. In such cases the ability to automatically determine the script ,and further the...

متن کامل

AmritaCEN_NLP @ FIRE 2015 Language Identification for Indian Languages in Social Media Text

The progression of social media contents, similar like Twitter and Facebook messages and blog post, has created, many new opportunities for language technology. The user generated contents such as tweets and blogs in most of the languages are written using Roman script due to distinct social culture and technology. Some of them using own language script and mixed script. The primary challenges ...

متن کامل

Script Identification from Bilingual Gujarati-English Documents

In a multi-lingual country like India, in most of the official papers, school text books, magazines, it is observed that English words intersperse within the Indian regional languages. So a bilingual Optical Character Recognition (OCR) system is needed which can recognize these bilingual documents and store it for future use. In this paper authors present an OCR system developed for the script ...

متن کامل

Labeling of Query Words using Conditional Random Field

This paper describes our approach on Query Word Labeling as an attempt in the shared task on Mixed Script Information Retrieval at Forum for Information Retrieval Evaluation (FIRE) 2015. The query is written in Roman script and the words were in English or transliterated from Indian regional languages. A total of eight Indian languages were present in addition to English. We also identified the...

متن کامل

Hindi-English Language Identification, Named Entity Recognition and Back Transliteration: Shared Task System Description

This paper presents an algorithm for word level language identification, named entity recognition and classification, and transliteration of Indian language words written in the Roman script to their native Devanagari script from bilingual textual data. We propose the construction of an extensive, hierarchical structured dictionary and hierarchical rule-based classifier to expedite word search ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Degraded Script Identification for Indian Language- A Survey

نویسندگان

چکیده

منابع مشابه

Script Identification for Document Image Retrieval: A Survey

AmritaCEN_NLP @ FIRE 2015 Language Identification for Indian Languages in Social Media Text

Script Identification from Bilingual Gujarati-English Documents

Labeling of Query Words using Conditional Random Field

Hindi-English Language Identification, Named Entity Recognition and Back Transliteration: Shared Task System Description

عنوان ژورنال:

اشتراک گذاری